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ABSTRACT 

We present the BOSS Lyman-a (Lya) Forest Sample from SDSS Data Release 9, comprising 54,468 
quasar spectra with z qso > 2.15 suitable for Lya forest analysis. This data set probes the intergalactic 
medium with absorption redshifts 2.0 < z a < 5.7 over an area of 3275 square degrees, and encompasses 
an approximate comoving volume of 20 /i _3 Gpc 3 . With each spectrum, we have included several 
products designed to aid in Lya forest analysis: improved sky masks that flag pixels where data may 
be unreliable, corrections for known biases in the pipeline estimated noise, masks for the cores of 
damped Lya systems and corrections for their wings, and estimates of the unabsorbed continua so 
that the observed flux can be converted to a fractional transmission. The continua are derived using a 
principal component fit to the quasar spectrum redwards of restframe Lya (A > 1216 A), extrapolated 
into the forest region and normalized by a linear function to fit the expected evolution of the Lya 
forest mean-flux. The estimated continuum errors are < 5% rms. We also discuss possible systematics 
arising from uncertain spectrophotometry and artifacts in the flux calibration; global corrections for 
the latter are provided. Our sample provides a convenient starting point for users to analyze clustering 
in BOSS Lya forest data, and it provides a fiducial data set that can be used to compare results from 
different analyses of baryon acoustic oscillations in the Lya forest. The full data set is available from 
the SDSS-III DR9 web site. 

Subject headings: intergalactic medium — quasars: emission lines — quasars: absorption lines - 
methods: data analysis 
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1. INTRODUCTION 



The Lyman-a (Lya) forest (|Lvndsl[i971ft is the ubiq- 
uitous absorption pattern observed in the spectra of 
high-redshift quasars, caused by Lya A1216 absorp- 
tion of residual neutral hydrogen embedded in a highly 
photo-ioni zed (nur/nH 5, 10 ) intergalactic medium 
(see e.g., iGunn fc Petersoidll965t lRauch|[l99l iMeiksinl 
2009). Over the past two decades, studies using both 
numerical and semi-analytic methods have established 
that the Lya forest directly traces the under lying dark 
matt e r fluctuations in inter-galactic space dCen et al 
19941 : iBi et al.lll995t iZhang et all 119951: iHernquist eTa! 



1991 iMiralda-Escude et all 1199a IBi fc Davidsenl Il997t 
Hui et all 119971: iTheuns et all 1199811 This theoreti- 
cal insight has enabled the Lya forest to be used 
as a cosmolog ical probe of the high-redshift (z > 2 
universe (e .g 



. iCroft et al 



or the higJ 
1 [1998L iMci 



McDonald et all [200 



Croft et all 120021: iZaldarriaga et all 120031: iViel et all 



20041: IMcDonald et alJl2005L 12006ft . 

In particular, the picture of the Lya forest as a 
continuous tracer of the underlying dark matter den- 
sity implies that observers no longer must resolve in- 
dividual forest lines to measure large-sca le correlations 
(|Croft et all fl998t I Weinberg et"aU 12003ft - - this ad- 
vance enables the use of moderate-resolution spectra that 
do not fully resolve the Lya forest absorption to per- 
form measurements of large-scale structure at z > 2. 
IMcDonald et all (|2006l ) used a sample of 3035 moder- 
ate resolution Lya fo rest spectra from the Sloan Digital 
Sky Survey (SDSS, lYork et all l2000h to measure the 
1-dimensional flux power spectrum at z = 2.2 — 4.2, 
allowing constraints to be placed on the linear mat- 
ter p ower spectrum ([McDonald et al . 2005; Sclia k et all 
12005ft and neutrino masses ([Seliak et all 12006ft . At 
higher quasar sightline densities, correlations can be 
measured in the transverse direction a cross different 
sightlines. McDonald & Eiscnstcin (2007) proposed that 
three-dimensional measurements of the Lya forest flux 
correlation could be used to measure the baryon acoustic 
oscillation (BAO) signature at scales of ~ 100ft. _1 Mpc. 

One of the key goals of the Baryon Oscill ation Spectro- 
scopic Survey (BOSS. [Dawson et all 12012ft. of SDSS-III 
(|Eisenstein et all 12011ft is to carry out precision BAO 
measurements from the Lya forest at z ks 2.5; for re- 
cent cosmologica l results from the BOSS galaxy redshift 



survey see, e.g., lAnderson et all ([2012ft : I Sanchez et all 
(|2012ft : iReid et all (|2012ft . Over its projected 4.5-year 
survey period, BOSS aims to obtain spectra of 170,000 
quasars with z > 2, with an areal density of 15 — 
20 deg -2 . The first public release of BOSS spectra w as 
through SDSS Data Release 9 (DR9. 1Ahn et~all[20T2[ ) in 
July 2012, comprising the first 1.5 years of BOSS ob- 
servations spanning Dec 2009 - July 2011. DR9 com- 
prises 535,995 new galaxy spectra and 102,100 quasar 
spectra at all redshifts, covering 3275 deg 2 of the sky. 
At the time of writing, the BOSS data have already pro- 
vided the first measurements of large-scale 3-dimen sional 
correlations in the Lya forest (|Slosar et all l20iT[ ). and 
we have recentl y reported the first BAO detection from 
the Lya forest (|Busca et all 12012ft . yielding a measure- 
ment of the expansion rate at t ~ 3 Gyr, intermediate 
between the recombination epoch probed by the cosmic 
microwave background and the "acceleration era" begin- 



ning at z ~ 0.8, or t « 6 Gyr. Because Lya forest BAO 
measurement is a novel endeavor and a central goal of 
BOSS, the collaboration is carrying out the first anal- 
yses using two largely independent methodologies and 
codes; result s from the alternati ve BAO analysis will be 
presented bv lSlosar et all ([2012ft . 

The spectra used in these papers are all available via 
DR9, and the DR9 quasar c atalog is described and pre- 
sented bv lParis et all ((2012). However, there are a num- 
ber of complex steps between a set of quasar spectra 
and a cosmological analysis of the Lya forest, including 
flagging unreliable data, removing or correcting regions 
affected by damped Lya absorbers (DLAs) or broad ab- 
sorption lines (BALs), accurately quantifying the noise, 
and determining the unabsorbed continuum baseline. 
The primary purpose of this paper is to present a data 
set for which all of the above steps have been imple- 
mented, drawing on the detailed internal investigations 
by the BOSS collaboration, so that users can easily per- 
form their own Lya forest analyses. Our quasar continua 
predict the intrinsic quasar flux with errors at the < 5% 
root-mean-squared (rms) level. For each spectrum, we 
introduce a pixel-level mask to flag regions that may be 
affected by data reduction problems, sky emission lines, 
DLAs, BALs, and non-Lya absorbers. The pipeline noise 
estimates are known to underestimate the true noise in 
the spectra by up to 15% at wavelengths relevant to most 
of our Lya forest data (3600 A < A < 5500 A); we in- 
clude corrections to remove these biases in the estimated 
pipeline noise. This sample thus removes or corrects for 
the most obvious systematics that might affect a Lya flux 
correlation analysis, although these must be assessed in 
more detail in the contex t of any particular stud y. 

The lBusca et all (|2012ft and lSlosar et~aTl (|2012ft papers 
each employ their own data selection criteria and quasar 
continuum treatments for their primary BAO measure- 
ments. However, an additional purpose of the present 
study is to provide a fiducial sample and continuum fit 
that can be used to compare the results from different 
methods. Both papers therefore present additional BAO 
measurements for the lya forest sample and continua 
presented here. 

Our sample is comprised of 54,468 BOSS spectra that 
probe the Lya forest in the redshift range 2.0 < z a < 5.7 
(where 1 + z a — A/1215.67 A) at a typic al sky area 
densi ty of ~ 16 sightlines per square degree ([Ross et all 
[2011 ). The co-moving volume encompassed by these 
sightlines is 



V 



(l + z) 2 d\(z) 
H(z) 



dn dz w 20 /i" 3 Gpc 3 , (1) 



where is the solid angle, H{z) is the Hubble expan- 
sion parameter, d,A is the angular diameter distance, 
and we have taken the integral over the redshift range 
2 < z < 3.5 assuming a ACDM universe with fl\ = 0.7, 
= 0.3, and H a = 7 kms~ 1 Mpc~|, con sistent with 
WMAP 7- year results (jKomatsu et alll2011[ ). 

This paper is organized as follows: § [2] summarizes 
the BOSS survey and provides relevant technical refer- 
ences; §[3] presents the basic selection of suitable spectra 
from the overall BOSS quasar sample; § 2] describes the 
per-spectrum products such as continua, masks, and cor- 
rections. We then describe several systematics of which 
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users need to be aware O, before providing information 
on data access and usage (Sj6|). 

2. SUMMARY OF BOSS SPECTRA 

BOSS ([Dawson et al.llMl) is one of four spectroscopic 
survevPlin SDSS-III (TEisenstein et al.ll2011h c onducted 



on the 2.5-meter Sloan telescope (jGunn et aLI I2006H at 
Apache Point Observatory, New Mexico. The target se- 
lections in all t hese surveys we re larg ely based o n the 
SPSS imag i ng (iFukugita et all [l99& iPier et all l200l 
iGunn et all I2006D that was completed in SDSS DR8 
(jAihara et al.ll201ll ). The BOSS spectra are obtained by 
twin spectrogra phs inspired by th e original SDSS spec- 
trograph design (|Smee et al.ll2012|) , that were completed 
in 2009 with improved volume phase holographic grat- 
ings, new CCDs, more fibers, and smaller fiber diameter 
relative to the SDSS instruments. The improvements 
produced roughly a factor of two increase in instrument 
throughput and roughly a factor of two decrease in sky 
background, enabling studies of a larger number of faint 
galaxies and quasars than what was possible in SDSS. 
Both spectrographs separate the light into a blue and a 
red camera, covering the wavelength range of 361 nm 
1014 nm with a resolving power A/AA ranging from 1300 
at the blue end t o 2600 at the red end . 

As described in lDawson et al.l ([2012D . a typical plate is 
designed with 80 "sky" fibers assigned to locations with 
no detected objects from SDSS imaging to provide an 
estimate of the sky background. In addition, each plate 
includes 20 "standard star" fibers that are assigned to 
objects photometrically classified as F stars to calibrate 
the spectral response of the instrument. About 160-200 
fibers (40 deg -2 ) are assigned to quasar candidates to 
probe neutral hydrogen via absorption in the Lya forest. 
The photometric classification and selection of quasar 
candidates for BOSS s pectroscopy prod uces 15-18 z > 
2.15 quasars deg" 2 (see lRoss et"atll2012fl . 

Exposure times for each plate are determined during 
observations to obtain a uniform depth across the sur- 
vey; on average, a plate is observed for five individual 
exposures of 15 minutes each. The data are processed 
and calibrated by a data reduction pip eline referred to 
as "idlspec2d" ([Schlegel et alJlm" prep.h . The functions 
of idlspec2d that are of consequence to Lya studies oc- 
cur primarily in the first stage of the pipeline, where 
data are extracted from the CCD images. In this stage, 
the variance for each pixel is estimated using read noise 
and the observed photon counts, sky background is sub- 
tracted using a model derived from the sky fibers, and 
flux calibration is performed using the spectra from the 
standard stars. Each exposure produces a sample of 
independent, flux-calibrated spectra for each object on 
the plate. These spectra are wavelength sampled cor- 
responding to the native CCD row spacing, which can 
vary from exposure to exposure due to flexure and focus 
changes. For each object, the individual flux calibrated 
spectra from each exposure are compared to the "pri- 
mary" spectrum with the highest signal-to-noise ratio. A 
low-order polynomial is derived to provide a wavelength 
dependent flux correction of each individual spectrum to 
match the spectrophotometry of the primary exposure. 



28 BOSS, S EGU E-2, 
lEisenstein et al.l poTll 'l 



MARVELS, and APOGEE; see 



Finally, the individual spectra are combined into a single 
spectrum that is binned into vacuum wavelength pixels 
of Alog 10 (A) = 10" 4 , i.e. Av = 69.02 km s" 1 . Each co- 
added spectrum is automatical ly redshifted and cla ssified 
in the final stage of idlspec2d ([Bolton et al.ll2012h . 

A spectrum of an object is identified by its plate, fiber 
number, and the modified Julian day (MJD) of the last 
exposure contributing to the coadd. A small number 
of objects have been multiplv-observecF^I. and each have 
multiple spectra with different plate-MJ D-fiber combina- 
tions. SDSS-III Data Release 9 (DR9: lAhn et al.ll2012t ) 
makes available these spectra as one FITS-format file 
per plate-MJD-fiber (with the file prefix "spec"), en- 
abling re-distribution of the exact subset of the spectra 
used for a particular analysis or catalog. The full ver- 
sion of these files includes both the coadded spectrum 
and the individual exposure spectra; the "lite" version 
does not include the individual exposures. The format of 
these files is described in detail within the SDSS-III web- 
sitcPI. Header Data Unit (HDU) 1 of these files contains 
vectors with the vacuum wavelength solution (in loga- 
rithmic units), co-added observed flux density (in units 

of 10 _17 erg s _1 cm _2 A ), estimated inverse variance of 
the noise, and bit mask vectors — these are listed in the 
top half of Table [TJ The spectra released (labeled with 
the file prefix "speclya" ) with this paper expand this 
format to include additional masks, noise corrections, 
Damped Lya (DLA) system corrections, and a contin- 
uum fit as described in § [4] Only HDU 1 is changed; 
other HDUs are the same as the original DR9 files. 

3. SAMPLE SELECTION 

In this section, we describe the spectrum-level cuts in 
order to select a useful sample of Lya forest spectra from 
the overall BOSS DR9 sample. 

We use as a parent catalog the BOSS DR9 quasar 
catalog of 87,822 objects visually confirmed as quasars 
([Paris et al.ll2012l hereafter DR9Q). In addition to iden- 
tifying quasars from the targeted candidates and flag- 
ging artifacts in the data, the visual inspection process 
of DR9Q also provides a visual refinement of the pipeline 
rcdshift estimates as well as identification of broad ab- 
sorption line (BAL) quasars and damped Lya (DLA) ab- 
sorbers. The redshift distribution of the z qso > 2 quasars 
is shown in Figure [TJ where we have adopted the visual 
inspection redshift, Z_VI, as the quasar redshift (this def- 
inition is used throughout the paper unless noted other- 
wise). DR9Q lists only unique quasars; in the case of 
quasars that have multiple spectra, the catalog lists only 
the spectra (i.e. plate-MJD-fiber combination) with the 
highest signal-to- noise ratio (SNR). 

It is clear fr om the Figure[T]that the BOSS qua sar tar- 
get selection ([Ross et al.ll2012t IBovv et al.ll2011l ) has se- 
lected an unprecedented number of high-redshift (z qso > 
2) quasars with accessible Lya forest. In principle, the 
minimum useable quasar redshift is that at which the 
quasar restframe Lya redshifts past the 3600 A blue- 
end cutoff of the BOSS spectrograph, z qso > 1.96. The 
absorber redshift distribution of all nominal Lya forest 
pixels in DR9Q is illustrated by the black histogram in 

29 Most notably plate 3615 and 3647, which have together cov- 
ered the same 7.1 deg 2 field 6 times in DR9 

30 http://data.sdss3.Org/datamodel/f iles/BOSS.SPECTRO.REDUX/Xpenalty 1 
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TABLE 1 

Spectral Products in HDU 1 of 'speclya' Product 



Standard Pipeline Products 


FLUX 


Coaddcd and calibrated flux density in units of 10 _17 crg s _1 cm _2 A 1 


LOGLAM 


Logarithm of wavelength in angstroms 


IVAR 


Inverse variance of flux 


AND_MASK 


AND mask a 


0R_MASK 


OR mask a 


WD ISP 


Wavelength dispersion in dloglam units 


SKY 


Subtracted sky flux density in units of 10 — 17 erg s _1 cm~ 2 A 1 


MODEL 


Pipeline best model fit used for classification and redshift b 


Value-Added Products 


MASK_C0MB 


Combined mask incorporating pipeline masks, sky-line masks, and DLA masks c 


NDISE_CORR 


Pipeline noise corrections 


DLA_CQRR 


Flux corrections for known DLAs 


CQNT 


Estimated quasar continuum in 1040 — 1600 A restframe, in units of 10 _17 erg s _1 cm~ 2 A 



a See http://www.sdss3.org/dr9/algorithms/bitmask_sppixmask.php for detailed description of the BOSS spectrum bitmask system 

See Bolton ct al. (2012) 
c See Table |2] for listing of combined masks 



6000 T 




Quasar redshift 

Fig. 1. — Redshift distribution of high-redshift (z qso > 2) quasars 
in DR9Q, and in the present Lyc* Forest Value-Added Sample. The 
axes of this figure excludes 22,617 DR9Q quasars with z qB o < 2 and 
22 quasars with z qso > 5. 
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Fig. 2. — Absorber redshift distribution of Lya forest pixels 
(1041 - 1185A restframe) in BOSS DR9. The black histogram 
shows all nominal Lya forest pixels from DR9Q, while the red 
histogram shows the final distribution in the pr esen t Lya forest 
sample, with the pixel- level masks applied (see § 14.10 . The sharp 
dips in the distribution of the pixels represent pixels which have 
been masked due to sky lines. 



TABLE 2 

Selection Cuts for Lya Forest Sample 



Description Number of Spectra 

DR9Q Quasars 87,822 

z qso < 2.15 -25,891 

BAL quasars —5, 848 

Low SNR -924 

Too many masked pixels —170 

Negative continuum —521 

Total 54,468 

Figure [2J However, for Lya forest analysis we want to 
ensure that each sightline contains a reasonable number 
of Lya forest pixels in order to allow stable continuum 
fitting, and cross-checks involving line-of-sight fluctua- 
tions. We therefore set the minimum quasar redshift to 
z qS o > 2.15: this ensures at least iVpbc ~ 157 useable Lya 
forest pixels (corresponding to a minimum velocity path- 
length of At) = 10800 km s^ 1 ) in each sightlineEJ This 
criterion excludes less than 0.9% of all possible Lya for- 
est pixels, which are in any case from the noisy blue-end 
of the BOSS spectrographs, and hence carry less weight 
in any analysis. The resulting pixel distribution is illus- 
trated by the red curve in Figure although this also 
includes pixel- level cuts (§ 14.11). For consisten c y with the 
SDSS Lya forest analysis of IMcDonald et al.l (|2006f ). we 
have defined the Lya forest region in each sightline to be 
1041-1185 A in the quasar restframe. This range conser- 
vatively avoids the quasar proximity zone at the red-end 
and the quasar Ly/3 emission line at the blue-end. 

In addition, broad absorption line (BAL) troughs may 
affect our continuum fitting and possibly introduce in- 
trinsic quasar absorption into the Lya forest region. 
Therefore, we discard the 5,848 quasars visually flagged 
as BAL quasars (BAL_FLAG_VI = 1) in DR9Q. 

Since our continuum-estimation technique uses the 
1030A < A rest < 1600A range in the quasar restframe 
spectrum, we also discard spectra in which more than 
20% of the pixels within this region are masked by the 
pipeline (see M4.1.1|) . Similarly, we require that no more 

31 Where iV pix = (log 10 A max -log 10 A min )/10- 4 , A min = 3600 A 
is the nominal BOSS blue-end cutoff, and A max = 1185 X (1 + 
2.15) = 3733 A is set by the red-end of the quasar Lya forest 
region. 
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TABLE 3 

Description of BOSSLyaDR9 Catalog 



Column 


i ormat 


Description 


bUbb-JNAMh 


A19 


SDSS-DR.9 designation 




J? 11. D 


Real ascension (J2000) 


DEC 


r 11. 


Declination ^jzUUUJ 


TUTlTf 1 Tn 

In±JNLi_±L) 


ti n 

11U 


Unique identifier 


DT ATP 




Plate number 


MJD 


16 


Spectroscopic MJD 


FIBER 


15 


Fiber number 


Z_VI 


F9.4 


Visual inspection redshift from DR9Q 


Z_PIPE 


F9.4 


BOSS pipeline redshift 


SNR 


F9.4 


Median SNR (1268 - 1380 A rest) 


SNR.LYA 


F9.4 


Median SNR (1041 - 1185 A rest) 


CHISQ_CONT 


F9.4 


Reduced chi-squared of continuum fit (1216 — 1600 A rest) 


C0NT_FLAG 


12 


Continuum visual inspection flag 


CONT_TEMPLATE 


A8 


Quasar template used 


Z_DLA 


F9.4 


DLA absorption redshift 


L0G_NHI 


F9.4 


Logarithm of DLA H I column density in cm -2 



than 20% of pixels within the 1041 A < A rost < 1185 A 
Lya forest region are masked by the pipeline (see § 14. ip . 

Next, we make a cut based on the SNR of the spec- 
tra. While the SNR requiremen ts for 3D Lya forest flux 
correlation analysis are modest ()McDonald fc Eisenstein 
I2007t iMcQuinn fc Wh"itell201lD . it is difficult to estimate 
continua from extremely noisy spectra. In the worst 
cases, even normalization is impossible. We therefore 
require our sample spectra to have a minimum median 
SNR of S/N > 0.5 per pixel evaluated over 1268 - 1380 A 
restframe (redwards of the quasar Lya line) and a min- 
imum median Lya forest SNR of S/N > 0.2 per pixel 
(after applying the noise corrections described in § 14. 2|) . 
We also cut spectra with more than one DLA (see § 14.31) 
within the Lya forest, but none of the objects within the 
sample violated this criterion. Spectra that have con- 
tinua (see § 14. 4|) with negative regions are also discarded 
— this removes 521 objects that satisfy all other criteria 
in the sample, although these are all low-SNR spectra 
(S/N < 1 per pixel in the forest). In Figure [3J we show 
the resulting median spectral SNR in our sample. Note 
that the effective SNR of the Lya forest region within 
each quasar spectrum is usually significantly lower than 
the red-side SNR due to IGM absorption and the increas- 
ing noise at the blue-end of the BOSS spectrographs. 

Our final sample consists of 54,468 unique quasar spec- 
tra suitable for Lya forest analysis, with our cuts summa- 
rized in Table [2] and the redshift distribution of all use- 
able Lya forest pixels within our sample is shown by the 
red histogram in Figure [3] — this also includes all pixel- 
level cuts described in subsequent sections of this paper. 
These objects are listed in a catalog, B0SSLyaDR9_cat 
(available in both ASCII and FITS formats). The con- 
tents are summarized in Table |3l The catalog and the 
individual spectra, described in the next section, can be 
downloaded from the SDSS-III websitcQ 

4. PER-SPECTRUM PRODUCTS 

In this section, we describe our expanded version of the 
BOSS high-redshift quasar spectra, intended to assist in 
Lya forest analyses. 

We use as a starting point the per-objecQ 'lite' co- 

32 http: //www. sdss3 . org/dr9/algorithms/lyaf _sample .php 

33 These files are not strictly 'per-object' as a small number of 
multiply-observed objects have multiple plate-MJD-fiber combina- 
tions 
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Fig. 3. — Signal-to-noise ratio distribution of the spectra in our 
Lya forest sample, evaluated both in the Lya forest region (green) 
and redwards of quasar restframe Lya (black) . Note that there are 
675 spectra with S/N> 20 per pixel over 1268 - 1380 A. 



add format released in SDSS DR9 (|Ahn et al.ll2012f) . 
which have the file prefix "spec" . The standard products 
packaged with this spectral format include the vacuum 
wavelength solution (in logarithmic units), co-added ob- 
served flux density (in units of 10~ 17 erg s~ 1 cm _2 A 1 ), 
estimated inverse variance of the noise, and bit mask 
vectors — these quantities are listed in the top half of 
Table CO 

However, a Lya forest analysis needs to take into ac- 
count various systematics, e.g. a detailed understanding 
of the pixel noise, masking of damped Lya absorbers 
(DLAs), and continuum fitting. In this section, we de- 
scribe these additional products intended to assist in Lya 
forest analysis, which are composed of four primary com- 
ponents: (1) a continuum estimate for each quasar us- 
ing the mean-flux regulated principal component analy- 
sis (MF-PCA) technique, (2) a noise correction to enable 
better noise estimates, (3) a simplified mask system to 
flag problematic pixels, and (4) corrections for interven- 
ing DLAs. These value-added products are packaged to- 
gether with the original "lite" format products into new 
per-object spectra with the prefix "speclya". While we 
have made it convenient to use the BOSS Lya forest data 
with this packaging, we emphasize we have not directly 
applied the new products unto the data, and users must 
perform the necessary operations themselves. 
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TABLE 4 
Combined Maskbits 



Bit 


Binary 


Description 


Name 


Digit 




PIPE 


1 


Pipeline andmask is nagged 


SKY 


2 


Improved mask for sky emission lines 


DLA 


3 


Mask for DLA cores 



See § 14,11 for full description 

4.1. Pixel Masks 

We now describe the bitmask system to flag pixels that 
should be discarded for Lya forest analysis. This pro- 
cess flags pixels identified by the pipeline as problematic, 
damped Lya absorbers (DLAs), and sky emission lines. 
These mask bits are combined in a binary sense: e.g., a 
pixel in which bits 1 and 3 are set will store a value of 
2 1 + 2 3 = 10. These masks are stored in the MASK_C0MB 
vector in each spectrum, and the flags are summarized 
in Table H 

4.1.1. Pipeline Mask 

The BOSS spe ctral pipeline (idlspec2d, 
ISchlegel et al.l lin prep.f ) utilizes a system of 25 pixel 
mask bits to flag problems that may have occurred dur- 
ing the pipeline reduction procesiO- The ORMASK vector 
in the co-added spectrum denotes pixels flagged by the 
pipeline in at least one of the individual exposures, while 
the ANDMASK vector denotes pixels that were flagged 
in the equivalent CCD column of all the individual 
exposures. The flagged pixels often have their inverse 
variances set to zero by the pipeline, but the pipeline 
masks are more comprehensive. 

In principle, all co-added pixels with ANDMASK = are 
free of problems, while flagged pixels may or may not be 
useful depending on the user's application and discretion. 
However, in the DR9 version of the pipeline mask bit 
24 ("NODATA", triggered by lack of detected flux) is 
erroneously set in the dichroic overlap region between 
the blue and red cameras, even when not all individual 
exposures were affected. This problem affects 9.7% of all 
pixels, which are actually useablj^l. 

For simplicity, we amalgamate the pipeline ANDMASK 
into our combined mask, such that maskbit 1 indicates 
pixels flagged by ANDMASK (except ANDMASK = 2 24 ). 

4.1.2. Sky Mask 

At the typical quasar magnitudes targeted by BOSS, 
the main contribution to pixel noise comes from the sky. 
This is particularly noticeable at pixels corresponding to 
the sky emission lines, where large deviations in flux are 
seen. These pixels should be discarded since the astro- 
physical signal has been washed out by the sky variance. 
In the pipeline, mask bit 23 ("SKYMASK") is used to 
flag pixels where the object's estimated sky flux is (a) 
more than 10a above the object flux, and (b) more than 
1.25 times the median flux over the neighboring 99 pixels. 
However, we have found that using this criterion alone is 
insufficient to fully mask strong sky emission lines — this 
is illustrated in Figure [4] which shows the stacked spec- 
trum of 1000 quasars centered around the O I A5577.338 

34 http : //www . sdss3 . org/dr9/algorithms/bitmask_sppixmask . php 

35 Note that this issue affects only the co-added spectra — users 
of the individual exposures should not ignore maskbit 24 



Stack of 1000 quasars with 3.08 < z < 3.18 









| u j 


: i \ 





5550 5560 5570 5580 5590 5600 5610 
Wavelength (A) 



Fig. 4. — Upper panel: Stacked flux from 1000 quasar spec- 
tra with z qao = 3.08 — 3.18 that have a flat intrinsic spectrum 
(Areat « 1350A) around the 5577.338A O I sky emission line. The 
features are caused by increased noise variance from the sky line. 
The vertical dotted lines provide a visual reference point for the 
extent of the sky line's effect on the spectrum. Lower panel: The 
pipeline noise inverse variance, where masked pixels have been set 
to zero using t he pip eline masks (black solid line) and our sky mask 
described in § 14.1.21 (dashed red line). The non-zero pixels within 
the dotted vertical lines indicate that the pipeline masks do not 
adequately mask for the O I line, whereas our new sky mask has 
done so thoroughly. 

telluric emission line. The lower panel shows the cor- 
responding inverse variances, with the pixels masked by 
the pipeline set to zero — the non-zero values within the 
envelope of the sky line indicate inadequate masking by 
the pipeline. In addition, weaker sky lines are often left 
unmasked by the pipeline. 

Since the sky calibration fibers in BOSS are themselves 
processed by the standard pipeline — including the sky 
subtraction estimated from all sky fibers in each plate — , 
the resulting residual spectra can be used to analyze the 
efficacy of the latter procedure. The mean and rms of 
these sky residuals is shown in Figure [5] Using this, we 
generate a list of sky wavelengths to be masked as follows: 
we first define a 'sky continuum' as the running average 
of the residual rms fluctuation centered around a ±25 
pixel window, and mask pixels that are above 1.25x the 
sky continuum. The continuum and mask list are then 
iterated until they converge; the final masking threshold 
is shown as the red curve in Figure [5] Pixels that are 
within 1.5 pixels of the listed wavelengths have mask bit 
2 set in our combined mask, and should be discarded in 
any analysis. While there will still be a residual variance 
contribution from the sky subtraction, it should now vary 
smoothly with wavelength. The effect of the new pipeline 
mask can also be seen in the red dashed line in Figure |U 
the O I feature is now fully masked. 

The mean residual of all sky-subtracted sky fibers 
is shown in the blue curve of Figure [5] — there is 
a small positive bias after the sky subtraction at the 

level of ~ 0.01 x 10~ 17 erg s _1 cm _2 A , and rising to 

~ 0.1 x 10 _17 erg s _1 cm _2 A at the blue and red ends 
of the spectra — this is the cause of the zero-point flux 
errors noted in Figure 4 of DR9Q. This bias arises be- 
cause the pipeline assigns a variance to the pixels based 
on their fluxes prior to the sky-subtraction step; this 
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Fig. 5. — The mean (blue) and RMS (black) of the residual sky flux in the BOSS DR9 spectra, as estimated from sky fibers. The red 
curve indicates the threshold used to define our new sky mask: wavelengths where the sky residual RMS rise above this threshold are 
masked. The feature at 6000 A < A < 6300 A in the mean residual is an artifact in the dichroic region, where data from the blue and red 
CCDs overlap. 

is relatively flat with restframe wavelength: 1330 A < 
A rcs t < 1380 A and 1450 A < A rcst < 1500 A. For each 
individual quasar, we then compute the ratio of the pipe 
line error estimate, <t p , to the root- mean square (rms) 
of the flux dispersion about the mean within these side 
bands. This quantity is then averaged over all DR9Q 
quasars; with the varying quasar redshifts, this gives us 
a wavelength-dependent measure of the accuracy of the 
pipeline noise estimation (blue points in Figure [6]). If the 
pipeline yields a perfect noise estimate, the plotted quan- 
tity should be unity at all wavelengths; on other hand, 
under (over) estimates will produce values below (above) 
unity. The flux dispersion in the blue part of the spec- 
tra (A < 4000 A) is seen to be about 15% larger than 
expected from the noise estimate given by the pipeline. 
The discrepancy decreases with increasing wavelength, 
and the two estimates are in agreement at A ~ 5700 A. 

This test clearly indicates a wavelength-dependent mis- 
calibration of the noise. However, since a fraction of the 
flux rms in the quasar side bands comes from interven- 
ing metals along the sightline, this procedure could be 
overly conservative in deriving the underestimation of 
the pipeline noise. Instead we recalibrate the pixel noise 
using three independent contributions derived from the 
data, which we shall now describe. 

Because the wavelength solution can vary between ex- 
posures, we first define a common wavelength grid with 
2.5 A pixels, about three times larger than on individ- 
ual exposures. The flux / in a given rebinned pixel is 
the weighted average of the flux of the contributing pix- 
els of the original spectrum, with the weight taken to 
be the pixel inverse- variance cr~ 2 . Input pixels which 
overlap two rebinned pixels are assigned to whichever 
rebinned pixel they overlap the most. The correction 
terms cor coa( jd(A), cor cxp and corfl ux (A, /) described be- 
low are computed from these rebinned single-exposure 
and coadded spectra. The total correction to the pixel 
noise is given by 

cor to t(A, /) = cor exp x cor coadd (A) x cor flux (/, A) . (2) 

The various noise correction factors are: 

Individual Exposure Correction, cor cxp : We check 
the reliability of the pipeline error estimates on 



Fig. 6. — The ratio of the pipeline noise estimate, <j p , to the 
actual flux dispersion in the spectra. The blue points denote this 
ratio as estimated from the quasar 1330 A < A rcst < 1380 A 
and 1450 A < A re st < 1500 A side-bands. The purple points 
indicate the contribution to this bias from the pipeline estimates 
in the individual exposures that comprise each BOSS spectrum, 
while the yellow points show the bias introduced by the pipeline 
co-addition procedure. The red points show the total correction 
from our procedure, cortot- 

underweights upwards sky fluctuations with respect to 
downwards sky fluctuations, providing an underestimate 
of the total sky background in low-SNR pixels. Since the 
flux transmission in the optically-thin Lya forest rarely 
drops to zero flux at BOSS resolution, we do not expect 
this to be a significant issue in Lya forest analysis with 
BOSS data, but users studying DLAs and Lyman-limit 
systems (LLS) need to take this into account. 

4.2. Noise Corrections 

An estimate of the noise associated with each pixel in 
each spectrum, cr p , is provided by idlspec2d. However, 
the pipeline is known to suffer f rom systematic under- 
estimates of the noise (see, e.g. iMcDonald et al.l 120061 : 
iDesiacaues et a l. 2007) To investigate the extent of this, 
we examine the pixel variance in spectral regions that are 
intrinsically smooth and flat. We use two AA rcs t ~ 50 A 
regions of quasar spectra (called 'side-bands'), redwards 
of the Lya peak (so as not to be affected by absorption 
from the Lya forest) and where the quasar continuum 



Lee et al. 



the individual exposures that comprise each BOSS 
spectrum. For instance, for N exposures of an ob- 
ject, the distribution of the pull S defined by 



S 



1 



N/2 

E 

i=0 



h 



(3) 



p,2i+l 



'p2i 



should be a Gaussian with zero mean and as = 1. 
In case of an odd total number of exposures, the 
last one is arbitrarily dropped in the computation 
of S. We calculate 175 for each quasar as the rms 
over the wavelength range of the Lya forest and use 
that as a per-quasar correction, cor oxp = l/erg(A). 
In Figure |5] we plot cor exp as a function of the av- 
erage observer- frame wavelength of the Lya forest, 
binned over multiple quasars per wavelength bin. 
The results indicate an underestimation of the pixel 
noise by about 6%, with a wavelength dependence 
of less than 3%. 

Co-addition Correction, cor coa dd(A): We examine 
the propagation of the noise estimate in the 
coaddition process by comparing the noise given 
by the pipeline on the coadded frame (variance 
c pcoadd ) to the noise computed from the weighted 
mean of the N exposures that contributed to 
the coadd, with variance cr p mcan such that 

VLan = T,i=O a p^ where the a P,i here are n0t 

corrected by cor eX p since we assume that the noise 
estimate errors in individual exposures and those 
introduced by the co-addition process are orthog- 
onal. The correction term for the co-addition 
process is defined by cor coadd = er p , C oadd/o-p,mcan- 
This increases with wavelength, from about 0.95 
at A = 4000 A to about 1.10 at A = 6000A, and is 
shown as the yellow points in Figure [6l 

Flux-dependent Correction, corfl ux (/, A): Within a 
given side-band, the ratio of the pixel noise, cor- 
rected by cor coadd x cor cxp , to the flux disper- 
sion in the same rest-frame wavelength range for 
all quasars exhibits a flux dependence. We cor- 
rect for this effect by applying a linear correction 
corfl ux (/, A), that we fit separately in five distinct 
wavelength bins, with the corrections bounded at 
corfl ux > 0.9. For typical fluxes in the Ly-a forest, 
the correction ranges between 1-5% for A < 5000 A 
and up to 9% for A > 5500 A. This mean over the 
spectra in our sample is shown as the black points 
in Figure HO 

The pipeline noise estimate is divided by the overall 
noise correction, <7 cor = cr p /cor to t(A, /), to yield a more 
accurate noise estimate. The average correction for our 
spectra is shown as the red points in Figure [SI The cor- 
rections for each object in our sample are stored in the 
NDISE_CORR vector of the corresponding spectrum. We 
have derived the above corrections only for the blue side 
of the spectra, A < 6300 A, which reaches up to z a = 4.18 
(see Figure [5]), which comprises the vast majority of Lya 
forest pixels. Pixels with A > 6300 A have their noise cor- 
rections set to unity, cor to t = 1-0, such that the pipeline 
noise remains uncorrected on the red side of the spectra. 



Several caveats should be kept in mind regarding these 
noise corrections. Some of the error in the pipeline noise 
estimates arises from scatter in the broad-band fluxing 
of the individual exposures and act as a covariance be- 
tween the individual pixels. As such, our noise correc- 
tions do not take into account off-diagonal terms of this 
overall covariance. We also note that there is an un- 
certainty of several percent regarding these noise correc- 
tions, e.g. the 'side-band' and 'total correction' curves in 
Figure |5] disagree by several percent although the overall 
wavelength-dependence is in good agreement. However, 
3D correlation analyses should not be sensitive to errors 
in the noise estimate although ID analyses will require 
a more careful approach than what we have presented 
here. 

We expect the pipeline noise estimates to be signif- 
icantly im proved when the n e w spe ctral extraction al- 
gorithm of IBolton fc Schlegell (|2010D is imple mented in 
subseq uent BOSS data releases. Alternatively. iLee et al.l 
(|2012f) will describe a probabilistic method for accu- 
rate noise estimation that allows separation of photon- 
counting and CCD noise components. 

4.3. Damped Lya Absorbers 

The cosmological utility of the optically thin Lya for- 
est (Njji ;$ 10 -17 cm~ 2 ) relies on the fact that the ab- 
sorption field is a weakly non-linear tracer of the under- 
lying dark matter fluctuatio ns. Damped Lya absorbers 
(DLAs, see lWolfe et al.ll2005l for a review), although also 
caused by neutral hydrogen absorption in the IGM, are 
collapsed objects that do not have the same correspon- 
dence with the large-scale density field. Moreover, each 
individual DLA causes large damped absorption profiles 
that affect large swathes (Av > 5000 km s _1 ) of affected 
sightlines. It is thus preferable to remove DLAs from any 
analysis of the large-scale Lya forest, although note that 
it is impossible to detect and remove all DLAs from the 
data, especially in the noisier spectra. 

In t heir early analysis of BOSS data, ISlosar et al.1 
(| 2 1 If) had simply discarded sightlines that contained 
DLAs identified by visual inspection. This is a sub- 
optimal approach, since while approximately 10% of all 
Lya forest sightlines contain DLAs, only ~ 10% of the 
Lya forest pixels in each affected sightline are directly 
impacted by the DLA. It would therefore be more eco- 
nomical to mask the saturated absorption cores of the 
DLAs, and correct for the effect of their broad damping 
wings in affected spectra. 

To deal with DLAs, we use a combination of three dif- 
ferent methods, described in iCarithers et al.1 (|2012f ) to 
detect DLAs in the BOSS quasar sightlines: visual in- 
spection, Fisher Discriminant Analysis, template cross- 
correlation. 

As mentioned above, all DR9Q spectra are visually 
inspected and spectra are flagged when a DLA is rec- 
ognized by the inspector. In addition, we employ two 
automated procedures for identifying DLAs. The first, 
described in lNoterdaeme et al.1 ()2012 ). uses a set of DLA 
absorption profile templates of various column densities 
that are cross-correlated with the quasar spectra. If the 
correlation coefficient is sufficiently high, a fit to a Voigt 
profile is performed to measure the column density and 
DLA redshift. If associated metal absorption lines are 
present redwards of the quasar Lya emission line, they 



BOSS DR9 Lya Forest Sample 



9 



3587-55182-310; RA = 8. 975741, DEC = -0.231 41 1 




4100 4150 4200 4250 4300 4350 

Wavelength (A) 

Fig. 7. — The spectrum of a Lya forest sightline with a DLA at 
z dla = 2.477, with a neutral hydrogen column density log 10 Nhi = 
21.19. The red spectru m sh ows the same spectrum after applying 
the steps described in § 14,31 the central equivalent width W (Equa- 
tion [4{ of the DLA has been masked, while remaining pixels have 
been corrected for damping wings (Equation . For clarity, both 
spectra have been smoothed with a 3-pixel mean boxcar function. 

are used to refine th e redshift. The secon d automated 
method, described in lCarithers et al.l ()2012D . is based on 
a Fisher Discriminant (Fisher 1936) machine-learning al- 
gorithm. After an initial screening that identifies spectral 
regions that are consistent with zero flux density and in- 
consistent with the continuum, a fit to a Voigt profile 
is performed. The errors and chi-squares from the fit, 
along with the initial screening probability, are passed to 
a Fisher Discriminant that has been trained on the visual 
identification DLA sample. Metal lines, when present, 
are used by this method as well to refine the DLA red- 
shift. 

Any DLA recognition algorithm must balance the re- 
quirements for efficiency and purity, and the most se- 
vere challenge is in the regime of low SNR and low col- 
umn density. Each of the three methods has strengths 
and weaknesses in this regard. To retain both high effi- 
ciency and high purit y, we define a concordance catalog 
(jCarithers et al.ir2012h consisting of all DLAs found by at 
least two of the three methods (in practice, the majority 
are found by all three techniques). In those cases where 
a DLA is found by both the template and FDA methods, 
the average of the two rcdshifts and column densities is 
used. Both these me thods have been tested o n the same 
set of mock spectra (jFont-Ribera et aLll2012f ) that have 
DLAs artificially inserted; both yield detection efficien- 
cies of > 95% for DLAs with log ln Nhi > 20.3 in spectra 
with continuum-to-noise ratioqfj of CNR > 2 per pixel. 

For each DLA within this concordance catalog, we 
mask the wa velength regi on corresponding to the equiv- 
alent width praindl2011f) : 



W « A c 



TTl e C 



la A Q 



1/2 



(4) 



where X a = 1216 A is the rest-frame wavelength of the 
hydrogen Lya transition, e is the electron charge, m e 
is the electron mass, c is the speed of light, Nhi is the 

36 Where the cont inuum is, in this case, defi ned s eparately within 
each algorithm; sec Notcrdacmc et al. (2012) and Carithcrs et al. 
12TTT2T ) for details. 



H I column density of the DLA, f a is the Lya oscillator 
strength, and j a is the sum of the Einstein A coefficients 
for the transition. Pixels that are masked due to DLAs 
are flagged by maskbit 3 in our combined mask. 

Beyond this region, we correct for the damping wings 
of the DLA by multiplying each pixel in the spectrum 
with exp(r w i ng (AA)), where 



' wing 



(AA) 



7a A c 



f a N H i\ a I — 
m e c J 47r \ AA 



(5) 



and AA = A — A Q is the wavelength separation in the 
DLA restframe. Each of the spectra in our sample in- 
cludes a vector, DLA_CDRR, that stores the damping wing 
corrections edia = exp(r w i ng ); this is set to unity in spec- 
tra without intervening DLAs. This correction vector 
should be multiplied into the flux and noise vectors; al- 
ternatively, users might opt to make more stringent cuts 
based on the value of the damping wing corrections. Fig- 
ure [7] shows a DLA in our sample, along with the masks 
and corrections that we have applied to correct for it. 

The Z_DLA and L0G_NHI fields in our catalog (Table [3]) 
lists the DLA absorber redshift and base-10 logarithm of 
the neutral hydrogen column density (in cm~ 2 ), respec- 
tively, for each spectrum in our sample. Both fields are 
set to —1 in spectra where no DLAs are detected. 



4.4. 



]uasar Continua 



In any Lya forest analysis, the transmitted Lya flux 
must be extracted by dividing the observed flux by an 
estimate for the intrinsic quasar continuum. This is a 
non-trivial step even in high-SNR spectra. Traditionally, 
power-law extrapolation from A rcs t > 1216 A has been 
used to estimate the q uasar continuum in noisy spectra 
(e.g. iPress et al.lll993f) . However, this technique is now 
known to be unreliable due to a break in the quasar con- 
tinuum at A rcst ~ 1200A (jTelfer et al.ll2002D . Moreover, 
the uncertain blue-end spectrophotometry in BOSS (see 
§ I5.1[) makes continuum extrapolations highly unreliable. 
It is thus necessary to use the information in the Lya for- 
est itself to estimate the continuum. 

For each BOSS DR9 quasar spectrum that satisfies our 
selection criteria in § [31 we provide a continuum estimate 
using a modified version of the mean-flux regulated prin- 
ci pal component an alysis (MF-PCA) technique described 
in lLee et al.l ((2012). This is technique essentially a two- 
step process: an initial PCA fit to the A rC st > 1216 A 
region of the quasar spectrum to predict the shape of 
the Lya forest continuum, followed by a 'mean-flux reg- 
ulation' step to ensure that the continuum amplitude is 
consistent with published constraints on the Lya forest 
mean- flux, (F)(z). 

4.4.1. PCA Fitting 

The first step in our continuum estimation process is 
to fit PCA templates to the quasar spectrum redwards 
of its Lya emission line, in the A rcs t = 1216 — 1600 A. 

However, since intervening metal absorption in that 
region might bias our continuum fit, we first execute a 
procedure to identify and mask these absorbers prior to 
fitting the continuum. For this purpo s e, we follow the 
procedure described in lLundgren et ail ((2009b . First, we 
define a pseudo-continuum by using a variation of a mov- 
ing mean that robustly fits both the quasar emission lines 
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Fig. 8. — Spectra (black) of randomly-selected quasars from our 
sample, and their corresponding MF-PCA continua (red). The 
two lower panels illustrate objects with inferior continuum fits 
(C0NT_FLAG = 2): in panel (d), a case where strong absorbers have 
stymied our efforts at absorption masking and biased the contin- 
uum fit; in panel (d), a weak emission-line quasar that is not rep- 
resented in our quasar templates. These unsatisfactory continua 
comprise only 1.7% of the total sample. 

and natter spectral regions over a broad range of quasar 
spectral morphologies. Residual absorption features in 
the normalized spectrum are then each fit with a Gaus- 
sian to produce estimates of the equivalent width, W, 
and associated errors aw ■ Absorption lines detected with 
W/aw > 3 have their pixel inverse variances set to zero 
and ignored in the subsequent stepj^l. 

We obtain the initial PCA continuum, CpcA, by per- 
forming an inverse variance- weighted least-squares fit to 
the 1216 A < A rc st < 1600 A region redwards of the 
quasar Lya emission line, using quasar templates with 8 
principal components. T wo different PCA q uasar tem- 
plates were employed: 1) ISuzuki et al.l ((2005) who used 
z < 1 quasars observed by the Hubble Space Tele- 
scope, in which the A rcs t < 1216 A continuum can be 
cle arly defined due t o the lower absorber density; and 
2) IParis et al.l (|2011[ ) who selected a sample of z ~ 3 
quasars with high-SNR from SDSS DR7 and carried out 
spline-fitting on the Lya forest continuum to estimate 

37 This absorber masking step was not done in Lee et al. (2012) 
— they instead used a iterative clipping method that was less ef- 
fective in discarding intervening absorbers 



the intrinsic quasar spectrum in that region. Both tem- 
plates are used to fit each BOSS quasar; the better- 
fitted template is then chosen based on the reduced chi- 
squared of the fit — this is denoted by either 'SUZUKI05' 
or 'PARISH' in the C0NT_TEMPLATE field of our cat- 
alog (Table EJ). We find that for the DR9 sample, 
about 85% of the q uasars were better represented by the 
ISuzuki et al.l (120051) templa tes while 15% were better-fit 
with the IParis et al.l (|201lD templates; in contrast, the 
corresponding percentages in DR7 (c.f. ILee et al.ll2012T) 
were 30% and 70%, respectively. We suspect that this 
is because fainter quasars are targeted in DR9 than in 
DR7; these faint quasars better ma tched by the lower- 
luminosity quasars that comprise the lSuzuki et al.l (2005) 
templates. 

However, not all the BOSS quasars are well-described 
by either of the quasar templates described above, in 
which case we cannot obtain a well-fitted PCA contin- 
uum. There are also cases in which strong absorption 
systems lying on top the quasar emission lines (most no- 
tably Lya) were not identified by the absorption-masking 
procedure, which biases the continuum fit. Initially, we 
attempted to use the reduced chi-squared statistic, y 2 /v, 
to quantify the fit quality, where v — N p i x — 11 — 1 is 
the number of degrees of freedom in our 11-parameter 
PCA model and 7V p i x is the number of pixels evaluated 
in the range 1216 A < A rost < 1600 A. We found that 
while most objects with x 2 /^ > 2 were indeed badly- 
fitted, many unsatisfactory fits had % 2 /V ~ 1, mostly in 
situations where absorption features were fitted by the 
principal components, giving unphysical continua. We 
have therefore visually inspected all the fitted continua 
in the restframe region redwards of 1216 A, and flagged 
objects that were not well-fit by our PCA templates. We 
have listed both the reduced chi-squared and visual con- 
tinuum flags in the CHISQ_C0NT and C0NT_FLAG fields, 
respectively, of the B0SSLyaDR9_cat catalog. 

Our convention for the visual inspection continuum 
flags is as follows: 

C0NT_FLAG=1: The fitted PCA continuum appears to de- 
scribe the intrinsic quasar continuum well. We al- 
low unphysical features in the continua (e.g. the 
'absorption feature' near A rcs t = 1216 A in panel 
(a) of Figured]), if they do not impact the overall 
fit. Comprises 98.3% of all spectra in our sample. 

C0NT_FLAG=2: The fitted PCA continuum is badly fit 
and does not resemble the intrinsic quasar spec- 
trum. These cases tend to be caused by either very 
strong absorbers that have eluded our masking pro- 
cess, or quasars with continuum shapes that are not 
captured by our templates (see panels (d) and (e) 
in Figure [8]). These comprise 1.7% of all spectra in 
our sample. 

Because we apply the mean-flux regulation step (next 
section), even the worst continua with C0NT_FLAG=2 
should yield rms continuum errors well under ~ 10%. We 
therefore do not recommend that users discard spectra 
based on these flags, but use them as a possible system- 
atic check. 

4.4.2. Mean-flux Regulation 
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The initial PCA continuum fit, Cpca, provides a pre- 
diction for the shape of the weak quasar emission lines in 
the 1041 A < A rcst < H85A region, but the overall am- 
plitude is uncertain due to the quasar power-law break 
and spectrophotometric errors. We therefore require that 
each quasar continuum match t he expected Lya for- 
est me an flux evolution, given by iFaucher-Giguere et al.1 
(2008) — we use their power-law-only fit without metal 
corrections: 

(F)(z) = exp[-0.001845(l + z abs ) 3 ' 924 ], (6) 

where z a b s is the absorber redshift. 

We fit a linear correction function of the form (a + 
&A res t), such that the final continuum, Cmf, yields a 
mean-flu x in agreement w ith Equation [6l This is differ- 
ent from lLee et al.l <|2012T ) . who performed this step using 
a quadratic fitting function of the form (1 + a A 4- &A ), 

where A = A rcst /1280A - 1 — we changed to the linear 
correction function since it is easier to compute analytic 
corrections for large -scale power along the line-of-sight 
(e.g., Appendix A in lSlosar et al.l[201lT i. 

In addition, th e weighting is carried out differently. In 
iLee et all (|201 2f ) , the correction function was fitted to the 
Lya forest split into 3 restframe bins, with the weights in 
each bin given by the inverse variance estimated through 
a bootstrap procedure; for our continua, we instead fit 
the correction function directly to the individual pixels, 
with weights given by the inverse of a 2 = + o~p , where 
a 2 N is the corrected (see § I4.2j) pipeline noise variance and 

a 2 F (z) = 0.065[(1 + z ahs )/3.25] 3 - 8 (F) 2 (z) (7) 

is the intrinsic variance of the Lya forest within a 
69 km s _1 pixel, as estima ted from the redshift e volu- 
tion of the power spectrum (Mc Donald et al.ll2006| ): and 
(F)(z) is given by Equation |U In this fit, we use only 
pixels with A > 3625 A in order avoid the regions most 
severely affected by the sky noise (c.f. Figure [5]). 

The mean- flux regulation corrections are applied to the 
initial continuum estimate, Cpca, bluewards of 1185 A. 
This introduces a discontinuity at 1185A in the final con- 
tinuum that is unphysical, but we do not expect any 
practical issues to arise from this discontinuity if our as- 
sumed Lya forest range is adopted. We have found that 
a small number (~ 500) of extremely low SNR (S/N < 1) 
spectra have continuum that go negative at some wave- 
lengths. Since this situation is clearly unphysical, we 
therefore discard these quasars from the overall sample. 

For all 54,468 quasars in our sample, we provide es- 
timated continua (in the CONT vector of each file) that 
cover the quasar restframe range 1040 - 1600A; the con- 
tinua outside of this ra nge are set to zero. From the 
tests on mock spectra by iLed (|2012t ) , we expect the typ- 
ical rms error of the MF-PCA continua to be around 
6% at S/N ~ 2 per pixel (evaluated within the forest), 
dropping to ~ 4% at higher SNR (S/N > 5 per pixel). 

Several caveats must be kept in mind with regards to 
our continua. First, because the MF-PCA method re- 
quires an external constraint of the Lya forest mean flux 
evolution, the continua presented here cannot be used 
to make an independent measurement of the Lya for- 
est mean flux — they are primarily intended to provide 
a good per-pixel continuum estimate at the expense of 
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Fig. 9. — Multiple observations of the same BOSS quasar, illus- 
trating the effect of differential atmospheric refraction on the spec- 
trophotometry. The spectra have been smoothed with a 5-pixel 
boxcar function for clarity. The spectrum with the plate-mjd-fiber 
combination 3615-55455-8 is the 'primary' spectrum catalogued in 
DR9Q. This is an unusually bright BOSS quasar, with a magnitude 
g = 18.66 and S/N fa 23 per pixel. 

zeroth order information on the mean flux. Another pos- 
sible issue is that the mean-flux regulation removes large- 
scale flux power along the line-of-sight, which means that 
our continua will not yield accurate measurements of one- 
dimensional flux power unless corrections are applied (A. 
Font-Ribera, private communication). In addition, the 
method would introduce some correlations in the con- 
tinua in neighboring lines-of-sight. Nevertheless, we do 
not expect this effect to bias measurements of the BAO 
peak position. 

5. KNOWN SYSTEM ATICS 

In this section, we describe several issues in the BOSS 
spectra that could have an impact on cosmological anal- 
yses. 

5.1. Spectrophotometric Errors 

To improve the blue-end signal-to-noise for Lya forest 
analysis at z ~ 2, we have made the following modi- 
fications in the way that quasar fibers are attached to 
the plug-plates on the BOSS spectrograph: (a) thin 
(175 — 300 fim) washers were attached to the plate plug- 
holes to provide an axial offset, and (b) the positions 
of the quasar fibers are offset by up to ~ 0.5" in or- 
der to maximize the light entering the fiber when tak- 
ing into account the atmospheric differe ntial refraction 
(ADR ) at the designed plate hour-angle (jDawson et al.l 
2012). These adjustments shift the effective focus from 
5300A (as originally designed) to ~ 4000A, which im- 
proves the blue-end signal-to-noise for Lya forest analy- 
sis. However, at time of writing the flux standard stars 
are observed only through fibers without these offsets, 
rendering the spectrophotometric calibration highly un- 
certain on the blue end. A BOSS ancillary program is 
now in place to observe a number of spectrophotometric 
standard stars through the quasar fibers in order to im- 
prove the spectrophotometric calibration, but the results 
of this program will not be incorporated until future data 
releases. 

Furthermore, the blue end of the spectrum is more 
susceptible to differential atmospheric refraction, causing 
the spectrophotometry of the spectra to vary as a func- 
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Fig. 10. — Top panel: A comparison of the extracted Lya forest transmission fields of the multiple-observations shown in Figure[9] derived 
by applying the continuum estimation described in § 14.41 on each individual observation. The transmission fields appear to be visually 
consistent with each other. Middle panel: The pull distribution, xM (Equation [SJ, from the multiple transmission fields in the top panel. 
Bottom panel: The histogram of the pull distribution shown in the middle panel. It is consistent with a Gaussian distribution with unit 
standard deviation, indicating our continuum-fitting method has removed the spectrophotometric variations from the transmission field. 



tion of observed zenith angle. This effect is illustrated in 
Figure^ where we show three spectra of a BOSS quasar 
that had been observed on multiple nights. An important 
consequence of this uncertain spectrophotometry is that 
quasar continua cannot be directly extrapolated from red- 
wards (\ ca t > 1216 A) of the quasar Lya emission line, 
e.g., using a power-law. Direct extrapolation generally 
produces a large continuum error even in spectra with 
good flux calibration, but the existing spectrophotomet- 
ric errors in BOSS means that direct extrapolation will 
be biased on average (see Figure 5 in DR9Q). 

However, the MF-PCA continua included with this 
sample ameliorates the spectrophotometric errors. This 
effect is illustrated in Figure I1Q[ where we compare the 
transmitted flux fields extracted from the multiple obser- 
vations of same object shown in Figure[9l with MF-PCA 
continua fitted to each individual spectrum. One sees 
from the top panel that the resultant flux fields appear 
consistent with each other, within the noise, despite the 
large differences in spectrophotometry as seen in Fig- 
ure O We further quantify this by computing another 



form of the pull: 

_ F t (X)-F(X) 
xi > A > ~ Z twin ov \ > 

where Fi is the transmitted (i.e. continuum-normalized) 
flux from the different observations denoted by subscript 
i, F corresponds to the average of all the observations 
at a given wavelength, and cr cor is the corrected pipeline 
noise. The values of x from the multiple observations 
are shown in the middle panel of Figure [TU1 The bottom 
panel shows the combined histogram of all the x distribu- 
tions, which appears Gaussian with a standard deviation 
close to unity, implying that pixel noise is sufficient to ac- 
count for the variance in the derived transmission fields 
and the variance from the spectrophotometric errors have 
been corrected. Although this particular quasar has un- 
usually high S/N, we have shown that errors in the rel- 
ative spectrophotometry do not significantly affect our 
continuum estimates. 

5.2. Flux Calibration Artifacts 



BOSS DR9 Lya Forest Sample 



13 



ill }\ 

F |fl 


f \ 


i 

i 

[ 


\ 1 

i 


1 / 




: 1 













3600 3800 4000 4200 4400 4600 4800 5000 
Wavelength (A) 

Fig. 11. — The ratio from dividing 28,848 quasars with g < 20.5 
by their pipeline PCA models citepbolton:2012. The features cor- 
respond to Balmer line wavelengths (H<5 through H-7 are shown as 
red vertical dotted lines), while the prominent absorption lines at 
3933.7 A and 3968.5 A (blue vertical dashed lines) are possibly a 
consequence of Ca II H&K absorption by the interstellar medium. 

We showed in § 14.1.21 that imperfect subtraction of 
prominent sky emission lines can lead to spectral arti- 
facts if not carefully dealt with. However, imperfect flux 
calibration can also lead to artifacts. This conversion 
from counts to flux is achieved, in part, by placing fibers 
on F sub-dwarf stars and using them as spectrophoto- 
metric standards. The derived calibration vectors are 
largely fixed for all fibers plugged into each plate, fed 
to each of the two BOSS spectrographs. These vec- 
tors can be characterised as constant for fibers 1-500 
and 501-1000 and therefore their flux calibration may 
vary from 'half-plate' to 'half-plate'. These spectropho- 
tometric standards show pronounced Balmer absorption 
lines and these must be masked and interpolated over 
for accurate fluxing. There are potential systematic er- 
rors associated with this proced ure as discussed in the 
DR2 and DR6 release papers (jAbazaiian et al.l 120041 : 
lAdelman- McCarthy et al.ll2006f ): these were ameliorated 
in the pipeline reduction of those releases but seem to 
have reappeared in the DR9 spectra. 

To illustrate these artifacts, in Figure [TT] we stack the 
ratio of the flux a nd the best-fit pipeline PCA model 
(|Bolton et al.ll2012T ) from all 28,848 good quasar spectra 
in the DR9 sample where the observed spectroscopic r- 
band magnitude was brighter than 20.5 (CLASS='QSO', 
ZWARNING=0, SPECTROSYNFLUX [2] > 6.3 nMyg). These 
ratios, and the formal pipeline errors, are combined 
at each observer-frame (barycenter) wavelength using a 
weighted mean with 3-sigma outlier rejection. We ex- 
clude any data points within 100 A of 31 possible emis- 
sion line locations at the quasar redshift, blueward of 
Lya, or where the template flux density is lower than 
0.5 erg/s/cm 2 /A. These exclusions imply that only the 
smooth quasar continuum at A rcs t > 1216 A contributes 
to the stack, while at A < 4000 A only low-redshift 
quasars at z < 2.0 contribute.. 

In the resulting ratio shown in Figure 111! we see un- 
wanted wavelength dependent structure at the ~ 2 — 3% 
level. The prominent Ca II H&K absorption lines, at 
3968.5 A and 3933.7 A respectively are thought to be 
some combination of absorption by the solar neighbour- 



hood, the interstellar medium and the Milky Way halo. 
In addition, artifacts are present at Balmer transition 
wavelengths due to imperfect correction of standard star 
absorption lines. 

At time of writing, this issue has not yet been fully 
corrected in the BOSS pipeline, so users must take this 
effect into account in their analyses. As an interim so- 
lution, the ratio shown in Figure [TT] can be used as a 
correction vector and has been made publicly available 
with our sample (see § [6] for download instructions) - 
the DR9 pipeline fluxes should be divided by this cor- 
rection vector to remove the Balmer features, and other 
fluxing artifacts, on average. This correction was applied 
to the spectra prior to the continuum fitting process in 
§ 14.41 but it is not otherwise incorporated into the fluxes 
in individual "speclya" spectra — users need to carry out 
this procedure themselves. 

It should be noted that iBusca et al.1 (|2012[ ) find that 
the magnitude of these artifacts are comparable for the 
two BOSS spectrographs and that the square-root of 
the half-plate-to-half-plate variance is no larger then 20- 
100% of the mean deviation (depending on the test ap- 
plied). They conclude that the error introduced by half- 
plate-wide deviations from this correction vector is in- 
significant for their analysis. 

6. DATA ACCESS AND USAGE GUIDELINES 

The files associated with the BOSS DR9 Lya For- 
est Sample described in thi s p aper can be downloaded 
from the SDSS-III websiteB We have generated 
BDSSLyaDR9_cat, a catalog listing the objects in this 
sample along with the additional information useful for 
Lya forest analysis (described in Table [3]) . It available 
in both FITS and ASCII formats. 

The main components of the sample are individual 
'speclya' spectral files corresponding to each object in 
our sample. These files are a value-added version of the 
'lite' per-object BOSS format (see §H]), but with addi- 
tional masks and corrections as listed in Table [TJ Note 
that these masks and corrections have not been applied 
to the pipeline flux, f p , nor inverse- variances, w p = <r~ 2 
by default, but are included as separa te v ectors in each 
file. The flux correction described in § 15.21 is available in 
a separate file, residcorr_v5_4_45.dat, that can also 
be downloaded from the aforementioned website. 

For a standard analysis, users should use all objects 
listed by their unique plate-MJD-fiber combination in 
the catalog, and each object will have a correspond- 
ing "speclya" spectrum file labeled by plate-MJD-fiber, 
grouped in subdirectories by plate number. The Lya 
forest pixels in the range 1041 A < A rcs t < 1185 A 
should be selected from each spectrum in the catalog, 
where the quasar restframe is defined with respect to the 
redshift given by the Z_VI (visual inspection redshift) 
field in the catalog. Pixels with zero inverse-variance 
or non-zero bits in the MASK_C0MB vector should then 
be discarded or masked. The pipeline flux, f p (FLUX in 
the speclya files), is then divided by the flux calibration 
correction^, eflux and multiplied by the DLA damping 
wing corrections, edia (DLA_C0RR), before dividing by the 

38 http : //www. sdss3 . org/dr9/algorithms/lyaf _sample .php 

39 Interpolated to the individual wavelength grids from 
residcorr_v5_4_45.dat described in § 15.21 
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MF-PCA continua, C M f (CONT), to obtain the transmit- 
ted Lya forest flux. The same operations are applied 
to the pipeline noise, cr p (although this is stored as the 
inverse- variance, w p = cr~ 2 , IVAR in the data files), but 
with the additional step of dividing by the noise correc- 
tions cor to t (N0ISE_C0RR). 

In other words, the Lya forest transmission field, Fi, 
is extracted from each spectrum i like so: 

FiM = /p,i(A) ( y* (A) ) , (9) 

where (1 + z a ) = A/1215.67 A. 

The corresponding inverse variance weights are derived 
from the pipeline inverse variances, tu Pi j, as follows: 

I \ l\\ 2 /,x / £ flux(A) C M f,i(A)\ 2 , . 
WF,i{ z a) = w P ,i(A) cor tot (A) t-t (10) 

\ £dla,i(,AJ y 

All pixels with MASK_C0MB set or w pA = should be 
masked or discarded. 

7. CONCLUSIONS 

We present the public release of the BOSS DR9 Lya 
Forest Sample, a set of 54,468 spectra suitable for Lya 
forest analysis selected from the BOSS DR9 quasar cat- 
alog, taking into account criteria such as redshift, SNR, 
and quality of spectra. For each spectrum, we also pro- 
vide the following products designed to aid in Lya forest 
analysis: 

• A simple maskbit system to flag pixels that may be 
affected by pipeline artifacts or sky emission lines, 
or that lie within DLA cores. 

• Corrections for DLA damping wings. 

• Noise correction vectors to make the pipeline noise 
estimate consistent with the actual pixel disper- 
sions. 

• An MF-PCA continuum estimate accurate to 5% 
rms at the median S/N of the data. 

In addition, we have also discussed two sytematics in 
the data that may affect Lya forest analyis. The rela- 
tive spectrophotometry is uncertain due to steps in the 
observational procedure taken to boost the Lya forest 
SNR, but we argue that the MF-PCA continua provided 
here removes these effects to first-order. We also dis- 
cuss artifacts in the spectra caused by the errors in the 
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flux calibration, and provide a global correction as an in- 
terim solution prior to a more thorough solution within 
the BOSS pipeline. 

While this sample is a convenient resource for users 
intending to work with the BOSS Lya forest data, we 
encourage users to make their own decision on cuts and 
corrections, as necessary, to optimize their analysis. This 
compilation also serves as a fiducial sample — to enable 
straightforward cross-comparison, users should run their 
analysis on the full sample with the value-added prod- 
ucts fully implemented (§[6]), in addition to analyses in- 
corporating alternative cuts, corrections, or continuum 
normalizations. The BOSS Collaboration has adopted 
this strategy for our Lya forest BAO analysis. 

The BOSS DR9 Lya Forest Sample is an unprece- 
dented data set: it encompasses a co- moving volume 
of ~ 20 h~ 3 Gpc 3 and represents a dense sampling at 
~ 16 quasar sightlines per square degree. We hope that 
readers who have not previously worked with Lya for- 
est data will take advantage of this unique data set to 
make their own contribution to our understanding of the 
high-rcdshift universe. 
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